Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thinking support for claude-3-7-sonnet #443

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

mikeyobrien
Copy link

Add support for extended thinking for Claude 3.7 Sonnet

https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

Screenshot 2025-02-24 193849

@roryeckel
Copy link

roryeckel commented Feb 25, 2025

The chat is working so well for me. However if I enable reasoning mode, then the title and tag generation fails
Possibly relevant log: pipelines | Unexpected data structure: 'text'
pipelines | Full data: {'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'thinking', 'thinking': '', 'signature': ''}}

@roryeckel
Copy link

roryeckel commented Feb 25, 2025

Should reasoning be enabled for title & tag generation?
I think it should respect the enable_thinking, which it does! Just thinking out loud.

@roryeckel
Copy link

The issue is caused by content[0] in the synchronous call being the thinking block instead of the text block.

@roryeckel
Copy link

roryeckel commented Feb 25, 2025

After I tried this:

    def get_completion(self, payload: dict) -> str:
        response = requests.post(self.url, headers=self.headers, json=payload)
        if response.status_code == 200:
            res = response.json()
            print(res)
            if "content" in res and (res_content := res["content"]):
                # Find the first content item that contains text
                for content_item in res_content:
                    if content_item.get("type") == "text" and "text" in content_item and (text := content_item["text"]):
                        return text
            return ""
        else:
            raise Exception(f"Error: {response.status_code} - {response.text}")

then tag generation worked. But title generation did not work, I didn't even see the request for it somehow....?

@thiswillbeyourgithub
Copy link

Does that pr allow setting the reasonning effort?

Also does it work if i'm using openrouter to call claude?

Copy link

@arty-hlr arty-hlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend changing the valve to a boolean, you can see that it looks better in the UI when setting valves, it's a simple toggle instead of a string.

@arty-hlr
Copy link

@thiswillbeyourgithub Yes, and no. There is a manifold to support reasoning on openrouter (https://github.com/rmarfil3/openwebui-openrouter-reasoning-tokens) but so far I haven't been able to make it work with Claude 3.7, it doesn't display the thinking.

@mikeyobrien
Copy link
Author

@arty-hlr made the requested changes to change from string to bool.

Should reasoning be enabled for title & tag generation? I think it should respect the enable_thinking, which it does! Just thinking out loud.

@roryeckel you were on the right track, I made the changes to only enable reasoning when streaming and now both title and tag generation are working.

@reasv
Copy link

reasv commented Feb 25, 2025

I think the amount of reasoning should not just be a "global" setting. You may want to change it on a per-query basis.

The "reasoning_effort" parameter should be used for this. If reasoning is globally enabled in the pipeline, this parameter determines how much reasoning to apply (should be possible to also set it to 0/None to disable reasoning in a specific chat, etc)

Since this parameter is a string, I would recommend supporting both integer values to specify the exact amount of reasoning tokens, as well as a few names for "breakpoints", for example:

{
    "none": None,  # Don't enable thinking
    "low": 1024,   # Minimum budget
    "medium": 4096,
    "high": 16384,
    "max": 32768,
}

image

@roryeckel
Copy link

I definitely agree with the reasoning effort part. Perhaps instead we define how many tokens are low, medium, and high in the pipeline valves. Then, use the native Open WebUI reasoning effort to map into the token budget.

@reasv
Copy link

reasv commented Feb 25, 2025

Yeah that could work.
Would avoid hardcoding any values

@mikeyobrien
Copy link
Author

I like this approach. Makes sense given the effort is logarithmic to the amount of budget tokens.

@mikeyobrien
Copy link
Author

@reasv Made the changes to remove global configuration of thinking and moved to use the reasoning effort param.

@roryeckel
Copy link

roryeckel commented Feb 25, 2025

In my local copy I refactored these latest changes like so:

class Pipeline:
    class Valves(BaseModel):
        ANTHROPIC_API_KEY: str = ""
        LOW_REASONING_EFFORT_TOKENS: int = 1024
        MEDIUM_REASONING_EFFORT_TOKENS: int = 4096
        HIGH_REASONING_EFFORT_TOKENS: int = 16384
        MAX_REASONING_EFFORT_TOKENS: int = 32768

    def __init__(self):
        self.type = "manifold"
        self.id = "anthropic"
        self.name = "anthropic/"

        self.valves = self.Valves(
            **{
                "ANTHROPIC_API_KEY": os.getenv(
                    "ANTHROPIC_API_KEY", "your-api-key-here"
                ),
            }
        )
        self.reasoning_effort_map = {
            'none': None,
            'low': self.valves.LOW_REASONING_EFFORT_TOKENS,
            'medium': self.valves.MEDIUM_REASONING_EFFORT_TOKENS,
            'high': self.valves.HIGH_REASONING_EFFORT_TOKENS,
            'max': self.valves.MAX_REASONING_EFFORT_TOKENS
        }

(Then replaced the other usages of REASONING_EFFORT_BUDGET_TOKEN_MAP with self.reasoning_effort_map - sorry I'm not a contributor to pipelines yet so I'm not sure how to propose these changes)

This makes it configurable in the UI:
image

However I noticed a bug with both of our changes. The UI reports "medium" reasoning level for DEFAULT. When using this setting, the reasoning effort is actually NONE. Would this bug be on the side of Open WebUI not passing "medium" when it's DEFAULT, or would the bug be on the pipeline side? Personally, I am leaning a bit towards "bug in Open WebUI" and not "bug in this pipeline". Because if we consider "NONE" to be "MEDIUM", then how would we be able to use NONE???? I will submit a PR upstream if you guys agree.
image

@tjbck
Copy link
Collaborator

tjbck commented Feb 26, 2025

Ready to be merged?

@mikeyobrien
Copy link
Author

@roryeckel I think it's just a case of odd UX. From my understanding, the "medium" text is just a placeholder example of what could potentially be added there and is does not take effect until an actual setting is made.

It makes sense that the default is none given most models until recently had no concept of reasoning effort.

@mikeyobrien
Copy link
Author

@tjbck I'm okay with merging this for now and leaving configurable effort to budget tokens in a follow up PR.

@arty-hlr
Copy link

I just tested it, for me only low and none work as reasoning_effort. medium, high, and max just hang, don't even display "thinking".

That aside, the "medium" as default value even though it's none is very confusing, I think people will just think it's broken, so I wouldn't merge in this state.

@roryeckel
Copy link

roryeckel commented Feb 26, 2025

I just tested it, for me only low and none work as reasoning_effort. medium, high, and max just hang, don't even display "thinking".

That aside, the "medium" as default value even though it's none is very confusing, I think people will just think it's broken, so I wouldn't merge in this state.

My current opinion is that OpenWebUI's reasoning_effort was implemented without consideration for models that can reason, but DON'T by default. Remember, the reference implementation was O3 which HAS to reason, so the issues with "none" wouldn't have been caught

@mikeyobrien
Copy link
Author

mikeyobrien commented Feb 26, 2025

@arty-hlr the problem is likely because medium and high are greater than your max_tokens param. If max_tokens < budget_tokens the output stream hangs. Try upping it to be higher than the mapped budget tokens value.

I haven't looked in detail into a mechanism to surface the error messages to the user.

@arty-hlr
Copy link

@mikeyobrien I'll try that, but I'd say this is still not ready for merge as an example in this state.

@arty-hlr
Copy link

@mikeyobrien I'm confused, in the model settings or controls that parameter is 128 by default, which actually doesn't make sense. I always thought that those settings made 0 sense in open-webui tbh, what's set in the model advanced settings is not reflected in the controls during chat, and it's just such a mess...

I would be in favor of not setting this with reasoning effort and going back to a valve. As it is with the state of those parameters in open-webui this is not usable unfortunately... :(

@roryeckel
Copy link

roryeckel commented Feb 27, 2025

that parameter is 128 by default

AdvancedParams.svelte:
image
image

I thought it was 2048? By the way, all these strings are hard-coded. So perhaps an acceptable fix for now would be to change 'medium' to straight up not have a default, or say the default is provider-specific

@arty-hlr
Copy link

@roryeckel This is context size, I believe mike was talking about max tokens:
image

Or is it so confusing that the names don't make sense? idk

@roryeckel
Copy link

roryeckel commented Feb 27, 2025

Or is it so confusing that the names don't make sense? idk

Oops, that was my confusion. However still the same story about the hard-coding. About your question from earlier, I think advancedparams is a sort of "temporary override" for the defaults that come from the model. You can pretty much ignore the defaults that advanacedparams tells you because it is "truly" coming from the model

Btw, "low" effort is way more than 128 tokens so I don't think it's num_predict, I think it's Context Length that needs to be changed to fix the freezing issue though.

@mikeyobrien
Copy link
Author

This is the issue I see in logs:

Exception: Error: 400 - {"type":"error","error":{"type":"invalid_request_error","message":"`max_tokens` must be greater than `thinking.budget_tokens`. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#max-tokens-and-context-window-size"}}

I'm opposed to reverting back to the global valve setting since there is no longer a mechanism to turn off thinking per-chat. Ideally the error is surfaced to the user.

@reasv
Copy link

reasv commented Feb 27, 2025

I am also opposed to making the value global. It needs to be settable per-conversation at least.
But this issue really shouldn't be a big deal. The script can check what the max tokens and thinking tokens are, on its own, can it not?

This is not something where we must wait for an error. We can read the settings and see if there is a contradiction.

Either

  1. Automatically and transparently add +thinking_tokens to max_tokens (so max tokens refers to the number of actual response tokens, as it normally does in any other scenario)
  2. throw an error before sending the request if thinking < max tokens
  3. reduce thinking tokens to fit within max_tokens

I am for solution #1. We can even let users turn it on/off with a valve.
It could also be conditional: if max_tokens is already big enough, don't add anything to it.

@tjbck
Copy link
Collaborator

tjbck commented Feb 27, 2025

Agreed, the default values in the tooltips should be removed entirely.

@arty-hlr
Copy link

Yeah I understand that it would be nice to be able to turn on/off or change how much thinking claude 3.7 is doing in the chat. I would agree with solution 1 as well.

@mikeyobrien
Copy link
Author

Alright, made the following changes:

  1. max_tokens becomes max_tokens + budget_tokens when reasoning is enabled
  2. error messages are surfaced now to inform the user of any issues instead of the stream hanging.

@arty-hlr
Copy link

Looks good! :)

@LionsAd
Copy link

LionsAd commented Feb 28, 2025

Nice!

print(f"Full data: {data}")
else:
raise Exception(f"Error: {response.status_code} - {response.text}")
"""Used for title and tag generation"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong function? I think this should be on get_completion ...

Copy link

@LionsAd LionsAd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for my one nit:

Works great! (I ported the code over to the function to test it)

So FWIW, "approving of these changes".

):
try:
budget_tokens = int(reasoning_effort)
except ValueError as e:
Copy link

@LionsAd LionsAd Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except ValueError as e:
except (ValueError, TypeError) as e:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for that is that reasoning_effort can be of NoneType and then this fails with:

ERROR [open_webui.functions] Error: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Could also do a check for reasoning_effort to not be NoneType. (No idea why body.get() in that case would not do the default value)

@LionsAd
Copy link

LionsAd commented Feb 28, 2025

I also got that defaults are not applied properly.

If I set the model to default, reasoning_effort is None - I think because default means to use whatever the pipe considers as default.

But the default says "medium", so it's a little bit confusing.

I found the discussion above.

My take is we can distinguish between "None" (as in NoneType) and "none" as in the string none.

That all said: This really should be a button to "Think" or not "Think" in OpenWeb UI - like in Grok.

So it's kind of an interesting case.

@LionsAd
Copy link

LionsAd commented Feb 28, 2025

I opened a discussion here to add a "Think" button:

open-webui/open-webui#11006

@LionsAd
Copy link

LionsAd commented Mar 1, 2025

Now with async and logging (which might be helpful here as well as it makes it super easy to see what is going on):

https://gist.github.com/LionsAd/ed81504e2663dcf33a3d2efc2f9a31f4

@shabie
Copy link

shabie commented Mar 2, 2025

None of this handles the claude with AWS Bedrock right..?

@LionsAd
Copy link

LionsAd commented Mar 2, 2025

"None of this handles the claude with AWS Bedrock right..?"

In theory it could, because AWS Bedrock just speaks Claude API (for all models), but you need a special signing key via boto3 client to make the request to the URL. So it's always a new token.

That said, you can just setup the AWS Bedrock Access Gateway to get an Open AI compatible gateway.

Btw. the way they transferred "reasoning_effort" into budget in the gateway is by taking a percentage of the max tokens available for the request.

You can find the gateway here:

https://github.com/aws-samples/bedrock-access-gateway

If you run OpenWeb UI without docker you can run the gateway also in a python "venv":

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

then start it.

@lentil32
Copy link

lentil32 commented Mar 3, 2025

For AWS Bedrock users, please see my bare minimum implementation based on @mikeyobrien 's pipeline: https://github.com/lentil32/anthropic_manifold_pipeline_aws_bedrock/

@krittaprot
Copy link

@LionsAd I just updated the bedrock-access-gateway, after this pull request is merged, do you think I will be able to just use claude-3.7-sonnet thinking capability through openai api inside Open-WebUI? I really wish it will work without any additional pipeline or function...

@LionsAd
Copy link

LionsAd commented Mar 3, 2025

@krittaprot Even though it's off-topic, right now I am not seeing the <think> tags in OpenWeb UI:

aws-samples/bedrock-access-gateway#117 fixes it.

@tjbck
Copy link
Collaborator

tjbck commented Mar 6, 2025

Thanks!

@tjbck tjbck merged commit f89ab37 into open-webui:main Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants